Web and Personal Image Annotation by Mining Label | 您所在的位置:网站首页 › personal image › Web and Personal Image Annotation by Mining Label |
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 3, MARCH 2012 1339 Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding Yi Yang, Fei Wu, Feiping Nie, Heng Tao Shen, Yueting Zhuang, and Alexander G. Hauptmann Abstract— The number of digital images rapidly increases, and it becomes an important challenge to organize these re- sources effectively. As a way to facilitate image categorization and retrieval, automatic image annotation has received much research attention. Considering that there are a great number of unlabeled images available, it is beneficial to develop an effective mechanism to leverage unlabeled images for large-scale image annotation. Meanwhile, a single image is usually associated with multiple labels, which are inherently correlated to each other. A straightforward method of image annotation is to decompose the problem into multiple independent single-label problems, but this ignores the underlying correlations among different labels. In this paper, we propose a new inductive algorithm for image annotation by integrating label correlation mining and visual sim- ilarity mining into a joint framework. We first construct a graph model according to image visual features. A multilabel classifier is then trained by simultaneously uncovering the shared structure common to different labels and the visual graph embedded label prediction matrix for image annotation. We show that the globally optimal solution of the proposed framework can be obtained by performing generalized eigen-decomposition. We apply the proposed framework to both web image annotation and personal album labeling using the NUS-WIDE, MSRA MM 2.0, and Kodak image data sets, and the AUC evaluation metric. Extensive ex- periments on large-scale image databases collected from the web and personal album show that the proposed algorithm is capable of utilizing both labeled and unlabeled data for image annotation and outperforms other algorithms. Index Terms— Label correlation mining, multilabel learning, personal album labeling, semisupervised learning, web image annotation. I. I NTRODUCTION W ith the development of computer network and storage technologies, we have witnessed explosive growth of web images. There are large amounts of digital images gener- Manuscript received September 15, 2010; revised July 14, 2011 and September 01, 2011; accepted September 02, 2011. Date of publication September 22, 2011; date of current version February 17, 2012. This work was supported in part by the Natural Science Foundation of China under Grant 90920303, by the 973 Program under Grant 2010CB327900, by the National Science Foundation under Grant CNS-0751185, and by the National Science Foundation under Grant IIS-0917072. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Miles N. Wernick. Y. Yang and A. G. Hauptmann are with the School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3890 USA. F. Wu and Y. Zhuang are with the College of Computer Science, Zhejiang University, Hangzhou 310027, China. F. Nie is with the Department of Computer Science and Engineering, Univer- sity of Texas, Arlington, TX 76019-0015 USA. H. T. Shen are with the School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Qld. 4072, Australia. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2011.2169269 ated, shared, and accessed on different websites, e.g., Flicker. With the popularity of digital cameras, we are able to create per- sonal photos easily. Consequently, the size of personal albums is getting larger. The growing number of web and personal im- ages requires an effective retrieval and browsing mechanism in either a content- or keyword-based manner. Much research ef- fort has been focused on this area during recent years, resulting in remarkable achievements [1], [2]. Among others, automatic image annotation technology, which associates images with la- bels or tags, has received much research interest [3]. Automatic image annotation enables conversion of image retrieval into text matching. Indexing and retrieval of text documents are faster and usually more accurate than that of raw multimedia data. Image annotation thus brings several benefits in image retrieval, such as high efficiency and accuracy. Image annotation is essentially a classification problem. In the field of multimedia and computer vision, many researchers have proposed a variety of machine learning and data mining algorithms for automatic image annotation recently [4]–[7]. These works have shown promising achievements in over- coming the well-known semantic gap by applying machine learning algorithms to image annotation. Generally speaking, these approaches can be roughly divided into the following two groups: The approaches in the first group are usually referred to as a tagging or retrieval-based paradigm. Image tagging approaches usually annotate images by leveraging web images, which are associated with user-defined tags. Typically, tagging approaches can be divided into two phases, i.e., a searching phase and a mining-for-tags phase. Tagging approaches first search for sim- ilar images from web-scale data sets and then mine the tex- tual information associated with the retrieved images for image annotation. Generally, there are three major research issues in image tagging: First, how to design an efficient indexing and matching algorithm for fast search over large-scale web image data sets; second, how to define accurate metrics for the retrieval process; and, third, how to utilize the search results for image tagging. For example, in [7], an efficient hashing scheme is pro- posed for image tagging. The system in [7] first searches for semantically and visually similar images from the web and then annotates images by mining the search results. In [8], a mul- tiple-feature distance metric learning algorithm was proposed for cartoon image retrieval. Wu et al. proposed a probabilistic distance metric learning scheme for retrieval-based image an- notation [9]. Because web images with user-generated tags are comparatively easy to obtain, image tagging has the advantage that less human labor is required. However, the automatically acquired images and tags are essentially noisy and incomplete [10]. Considering that the performance directly depends on the 1057-7149/$26.00 © 2011 IEEE |
CopyRight 2018-2019 实验室设备网 版权所有 |